Decompilation aims to transform a low-level program language (LPL) (eg., binary file) into its functionally-equivalent high-level program language (HPL) (e.g., C/C++). It is a core technology in software security, especially in vulnerability discovery and malware analysis. In recent years, with the successful application of neural machine translation (NMT) models in natural language processing (NLP), researchers have tried to build neural decompilers by borrowing the idea of NMT. They formulate the decompilation process as a translation problem between LPL and HPL, aiming to reduce the human cost required to develop decompilation tools and improve their generalizability. However, state-of-the-art learning-based decompilers do not cope well with compiler-optimized binaries. Since real-world binaries are mostly compiler-optimized, decompilers that do not consider optimized binaries have limited practical significance. In this paper, we propose a novel learning-based approach named NeurDP, that targets compiler-optimized binaries. NeurDP uses a graph neural network (GNN) model to convert LPL to an intermediate representation (IR), which bridges the gap between source code and optimized binary. We also design an Optimized Translation Unit (OTU) to split functions into smaller code fragments for better translation performance. Evaluation results on datasets containing various types of statements show that NeurDP can decompile optimized binaries with 45.21% higher accuracy than state-of-the-art neural decompilation frameworks.
translated by 谷歌翻译
Label smoothing is a regularization technique widely used in supervised learning to improve the generalization of models on various tasks, such as image classification and machine translation. However, the effectiveness of label smoothing in multi-hop question answering (MHQA) has yet to be well studied. In this paper, we systematically analyze the role of label smoothing on various modules of MHQA and propose F1 smoothing, a novel label smoothing technique specifically designed for machine reading comprehension (MRC) tasks. We evaluate our method on the HotpotQA dataset and demonstrate its superiority over several strong baselines, including models that utilize complex attention mechanisms. Our results suggest that label smoothing can be effective in MHQA, but the choice of smoothing strategy can significantly affect performance.
translated by 谷歌翻译
Recent efforts in Neural Rendering Fields (NeRF) have shown impressive results on novel view synthesis by utilizing implicit neural representation to represent 3D scenes. Due to the process of volumetric rendering, the inference speed for NeRF is extremely slow, limiting the application scenarios of utilizing NeRF on resource-constrained hardware, such as mobile devices. Many works have been conducted to reduce the latency of running NeRF models. However, most of them still require high-end GPU for acceleration or extra storage memory, which is all unavailable on mobile devices. Another emerging direction utilizes the neural light field (NeLF) for speedup, as only one forward pass is performed on a ray to predict the pixel color. Nevertheless, to reach a similar rendering quality as NeRF, the network in NeLF is designed with intensive computation, which is not mobile-friendly. In this work, we propose an efficient network that runs in real-time on mobile devices for neural rendering. We follow the setting of NeLF to train our network. Unlike existing works, we introduce a novel network architecture that runs efficiently on mobile devices with low latency and small size, i.e., saving $15\times \sim 24\times$ storage compared with MobileNeRF. Our model achieves high-resolution generation while maintaining real-time inference for both synthetic and real-world scenes on mobile devices, e.g., $18.04$ms (iPhone 13) for rendering one $1008\times756$ image of real 3D scenes. Additionally, we achieve similar image quality as NeRF and better quality than MobileNeRF (PSNR $26.15$ vs. $25.91$ on the real-world forward-facing dataset).
translated by 谷歌翻译
Image-based head swapping task aims to stitch a source head to another source body flawlessly. This seldom-studied task faces two major challenges: 1) Preserving the head and body from various sources while generating a seamless transition region. 2) No paired head swapping dataset and benchmark so far. In this paper, we propose an image-based head swapping framework (HS-Diffusion) which consists of a semantic-guided latent diffusion model (SG-LDM) and a semantic layout generator. We blend the semantic layouts of source head and source body, and then inpaint the transition region by the semantic layout generator, achieving a coarse-grained head swapping. SG-LDM can further implement fine-grained head swapping with the blended layout as condition by a progressive fusion process, while preserving source head and source body with high-quality reconstruction. To this end, we design a head-cover augmentation strategy for training and a neck alignment trick for geometric realism. Importantly, we construct a new image-based head swapping benchmark and propose two tailor-designed metrics (Mask-FID and Focal-FID). Extensive experiments demonstrate the superiority of our framework. The code will be available: https://github.com/qinghew/HS-Diffusion.
translated by 谷歌翻译
The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
Frozen pretrained models have become a viable alternative to the pretraining-then-finetuning paradigm for transfer learning. However, with frozen models there are relatively few parameters available for adapting to downstream tasks, which is problematic in computer vision where tasks vary significantly in input/output format and the type of information that is of value. In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition. From this empirical analysis, our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes. We additionally examine the upper bound of performance using a giant frozen pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches competitive performance on a varied set of major benchmarks with only one shared frozen base network: 60.0 box mAP and 52.2 mask mAP on COCO object detection test-dev, 57.6 val mIoU on ADE20K semantic segmentation, and 81.7 top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to bring greater attention to this promising path of freezing pretrained image models.
translated by 谷歌翻译
事件提取(EE)是信息提取的重要任务,该任务旨在从非结构化文本中提取结构化事件信息。大多数先前的工作都专注于提取平坦的事件,同时忽略重叠或嵌套的事件。多个重叠和嵌套EE的模型包括几个连续的阶段来提取事件触发器和参数,这些阶段患有错误传播。因此,我们设计了一种简单而有效的标记方案和模型,以将EE作为单词关系识别,称为oneee。触发器或参数单词之间的关系在一个阶段同时识别出并行网格标记,从而产生非常快的事件提取速度。该模型配备了自适应事件融合模块,以生成事件感知表示表示和距离感知的预测指标,以整合单词关系识别的相对距离信息,从经验上证明这是有效的机制。对3个重叠和嵌套的EE基准测试的实验,即少数FC,GENIA11和GENIA13,表明Oneee实现了最新的(SOTA)结果。此外,ONEEE的推理速度比相同条件下的基线的推理速度快,并且由于它支持平行推断,因此可以进一步改善。
translated by 谷歌翻译
近年来,多视图学习迅速发展。尽管许多先前的研究都认为每个实例都出现在所有视图中,但在现实世界应用程序中很常见,从某些视图中丢失实例,从而导致多视图数据不完整。为了解决这个问题,我们提出了一个新型潜在的异质图网络(LHGN),以实现不完整的多视图学习,该学习旨在以灵活的方式尽可能充分地使用多个不完整的视图。通过学习统一的潜在代表,隐含地实现了不同观点之间一致性和互补性之间的权衡。为了探索样本与潜在表示之间的复杂关系,首次提出了邻域约束和视图约束,以构建异质图。最后,为了避免训练和测试阶段之间的任何不一致之处,基于图形学习的分类任务应用了转导学习技术。对现实世界数据集的广泛实验结果证明了我们模型对现有最新方法的有效性。
translated by 谷歌翻译
我们提出了一种新颖的方法来重新定位或放置识别,这是许多机器人技术,自动化和AR应用中要解决的基本问题。我们不依靠通常不稳定的外观信息,而是考虑以局部对象形式给出参考图的情况。我们的本地化框架依赖于3D语义对象检测,然后与地图中的对象关联。可能的配对关联集是基于评估空间兼容性的合并度量的层次聚类而生长的。后者特别使用有关​​相对对象配置的信息,该信息相对于全局转换是不变的。随着相机逐步探索环境并检测更多对象,关联集将进行更新和扩展。我们在几种具有挑战性的情况下测试我们的算法,包括动态场景,大型视图变化以及具有重复实例的场景。我们的实验表明,我们的方法在鲁棒性和准确性方面都优于先前的艺术。
translated by 谷歌翻译
随着视频数量的越来越多,对技术的需求很大,可以帮助人们迅速导航到他们感兴趣的视频片段。但是,当前的视频理解主要理解主要是视频内容摘要,而几乎没有努力,而对探索视频的结构。受文本轮廓生成的启发,我们介绍了一项新颖的视频理解任务,即视频大纲生成(VOG)。该任务定义为包含两个子任务:(1)首先根据内容结构对视频进行分割,然后(2)为每个段生成一个标题。要学习和评估VOG,我们注释了一个10K+数据集,称为Duvog。具体来说,我们使用OCR工具来识别视频的字幕。然后,要求注释者将字幕分为章节,并将每个章节分为标题。在视频中,突出显示的文本往往是标题,因为它更有可能引起人们的注意。因此,我们提出了一个视觉字幕功能增强的视频大纲生成模型(VSENET),该模型将文本字幕及其视觉字体大小和位置作为输入。我们将VOG任务视为一个序列标记问题,该问题提取了跨标题的位置,然后将其重写以形成最终大纲。此外,基于视频概述和文本概述之间的相似性,我们使用大量文章带有章节标题来预先我们的模型。 Duvog上的实验表明,我们的模型在很大程度上胜过其他基线方法,对于视频分割水平达到了77.1的F1得分,对于标题生成级别的Rouge-L_F0.5的85.0。
translated by 谷歌翻译